In [1]:

    
%matplotlib inline
import sys
print(sys.version)
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)
import matplotlib.pyplot as plt









    



3.3.2 (v3.3.2:d047928ae3f6, May 13 2013, 13:52:24) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
1.9.2
0.16.2

Alright now we’re going to be working with the running data set. so let’s go ahead and import it. We’ll see again that it’s not converting the dates so we’ve got to do that manually. However it doesn’t work exactly like it worked last time.



In [2]:

    
df = pd.read_csv('../data/date_fixed_running_data.csv')



In [3]:

    
df.head()

This is because when you save a data frame to a csv it doesn’t label the index column. So now our column is actually the ‘zero’ column. When you use parse_dates you can use either the column name (if available) or the 0-based column index number. This happened because the index didn’t have a name when we saved the last csv.



In [4]:

    
df['Unnamed: 0'].head()









    Out[4]:





0    2014-08-02
1    2014-08-03
2    2014-08-04
3    2014-08-05
4    2014-08-06
Name: Unnamed: 0, dtype: object



In [5]:

    
df = pd.read_csv('../data/date_fixed_running_data.csv', parse_dates=['Date'])









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-96dc61f0248c> in <module>()
----> 1 df = pd.read_csv('../data/date_fixed_running_data.csv', parse_dates=['Date'])

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)
    472                     skip_blank_lines=skip_blank_lines)
    473 
--> 474         return _read(filepath_or_buffer, kwds)
    475 
    476     parser_f.__name__ = name

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    248 
    249     # Create the parser.
--> 250     parser = TextFileReader(filepath_or_buffer, **kwds)
    251 
    252     if (nrows is not None) and (chunksize is not None):

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    564             self.options['has_index_names'] = kwds['has_index_names']
    565 
--> 566         self._make_engine(self.engine)
    567 
    568     def _get_options_with_defaults(self, engine):

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    703     def _make_engine(self, engine='c'):
    704         if engine == 'c':
--> 705             self._engine = CParserWrapper(self.f, **self.options)
    706         else:
    707             if engine == 'python':

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1113                 raise ValueError("Usecols do not match names.")
   1114 
-> 1115         self._set_noconvert_columns()
   1116 
   1117         self.orig_names = self.names

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/parsers.py in _set_noconvert_columns(self)
   1149                         _set(k)
   1150                 else:
-> 1151                     _set(val)
   1152 
   1153         elif isinstance(self.parse_dates, dict):

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/parsers.py in _set(x)
   1141                 self._reader.set_noconvert(x)
   1142             else:
-> 1143                 self._reader.set_noconvert(names.index(x))
   1144 
   1145         if isinstance(self.parse_dates, list):

ValueError: 'Date' is not in list

We can import the date correctly by specifying the zero based look up or by specifying the name like we did in the last video.



In [6]:

    
df = pd.read_csv('../data/date_fixed_running_data.csv', parse_dates=[0])



In [7]:

    
df.head()

Let’s go ahead and rename the columns and set our index to the dates again.



In [8]:

    
cols = ['Date', 'Miles', 'Time']
df.columns = cols
df.head()

We can also rename the column using the rename method.



In [9]:

    
df.rename(columns={df.columns[0]:'Date'}, inplace=True)



In [10]:

    
df.head()

Now we can plot it again once we set the index.



In [11]:

    
df.set_index('Date', inplace=True)



In [12]:

    
df.plot()









    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x1079677d0>

Now one thing we’re going to want to do is get a breakdown of times for each run. Stats like minutes per mile and that sort of thing.

To do that we’re going to use map. In order to convert to seconds we’re going to need to convert our times represented as hour/min/seconds into just pure seconds. From that we can do minutes etc.

Let’s go ahead and write our map function. Now there is an edge case that we need to handle. Remember we don’t have time data for every single run so we’ve got to handle the NaN values correctly.

we'll do this by just keeping them in there.



In [16]:

    
raw_time_fmt = df.Time[0]

def get_total_seconds(raw_time):
    if raw_time is np.nan:
        return np.nan # if it's blank, keep it blank
    hrs, mins, seconds = str(raw_time).split(':')
    seconds = int(seconds) + 60 * int(mins) + 60 * 60 * int(hrs)
    return seconds    
print(get_total_seconds(raw_time_fmt))

nan



In [17]:

    
df['Seconds'] = df.Time.map(get_total_seconds)



In [18]:

    
df.head(10)

Now we can describe and see I recorded times for 52 of the runs.



In [19]:

    
df.describe()









    Out[19]:






  
    
      
      Miles
      Seconds
    
  
  
    
      count
      166.000000
      52.000000
    
    
      mean
      1.530964
      1901.788462
    
    
      std
      2.979802
      2560.424171
    
    
      min
      0.000000
      376.000000
    
    
      25%
      0.000000
      583.750000
    
    
      50%
      0.000000
      1343.500000
    
    
      75%
      2.007500
      1743.250000
    
    
      max
      26.200000
      15643.000000

This is a great example of what np.NaN is so great in pandas. When we do this describe we get answers unlike in numpy and it just disregards the empty values. If we were to fill it in with zeros that would drag down our statistics.

We can see that here. See how much it changes the data when it is filled with 0 which would be incorrect analysis.



In [20]:

    
df.fillna(0).describe()









    Out[20]:






  
    
      
      Miles
      Seconds
    
  
  
    
      count
      166.000000
      166.000000
    
    
      mean
      1.530964
      595.740964
    
    
      std
      2.979802
      1676.040290
    
    
      min
      0.000000
      0.000000
    
    
      25%
      0.000000
      0.000000
    
    
      50%
      0.000000
      0.000000
    
    
      75%
      2.007500
      561.750000
    
    
      max
      26.200000
      15643.000000

Now that we have seconds it would be useful to see minutes too. I find it hard to think in seconds.

So let’s add a new column.



In [21]:

    
df['Minutes'] = df['Seconds'].map(lambda x: x / 60)



In [22]:

    
df.describe()









    Out[22]:






  
    
      
      Miles
      Seconds
      Minutes
    
  
  
    
      count
      166.000000
      52.000000
      52.000000
    
    
      mean
      1.530964
      1901.788462
      31.696474
    
    
      std
      2.979802
      2560.424171
      42.673736
    
    
      min
      0.000000
      376.000000
      6.266667
    
    
      25%
      0.000000
      583.750000
      9.729167
    
    
      50%
      0.000000
      1343.500000
      22.391667
    
    
      75%
      2.007500
      1743.250000
      29.054167
    
    
      max
      26.200000
      15643.000000
      260.716667

Now let’s do a describe and plot it again.



In [23]:

    
df.plot(x='Miles', y='Minutes', kind='scatter')









    Out[23]:





<matplotlib.axes._subplots.AxesSubplot at 0x107acd210>

Let’s plot Miles and Minutes together in a scatter plot. Wow that’s linear. Let’s see how correlated they are. We do this with the cor method. We can see that Miles to time are very tightly correlated (using pearson standard correlation coefficients) there are two other correlation methods that you can use, kendall Tau, and Spearman rank correlation.



In [24]:

    
df.corr()



In [25]:

    
df.corr(method='kendall')



In [26]:

    
df.corr(method='spearman')

Now let’s see a box plot. With these two we get a much better idea of the data. We can see that most of my runs are below an hour except for a couple that are much longer.-



In [27]:

    
df.boxplot('Minutes', return_type='axes')









    Out[27]:





<matplotlib.axes._subplots.AxesSubplot at 0x108ea7e50>

Now let’s add minutes per mile, we can just divide our two series to get those numbers.



In [28]:

    
df['Minutes'] / df['Miles']









    Out[28]:





Date
2014-08-02          NaN
2014-08-03          NaN
2014-08-04          NaN
2014-08-05          NaN
2014-08-06          NaN
2014-08-07          NaN
2014-08-08          NaN
2014-08-09          NaN
2014-08-10     8.516865
2014-08-11     7.854291
2014-08-12     7.453052
2014-08-13     7.411348
2014-08-14     6.962963
2014-08-14     6.962963
2014-08-15          NaN
2014-08-16          NaN
2014-08-17          NaN
2014-08-18          NaN
2014-08-19          NaN
2014-08-19          NaN
2014-08-20     6.970803
2014-08-21     6.861314
2014-08-22     6.630170
2014-08-23          NaN
2014-08-24          NaN
2014-08-25     6.508516
2014-08-25          NaN
2014-08-26     7.664234
2014-08-27    10.000000
2014-08-28          NaN
                ...    
2014-12-04          NaN
2014-12-05          NaN
2014-12-06          NaN
2014-12-07          NaN
2014-12-08          NaN
2014-12-09     8.532423
2014-12-10          NaN
2014-12-11          NaN
2014-12-12     6.751825
2014-12-13     8.226415
2014-12-13     8.589744
2014-12-14          NaN
2014-12-15     9.033333
2014-12-16          NaN
2014-12-17     8.250000
2014-12-18          NaN
2014-12-19     8.467742
2014-12-20     9.121622
2014-12-21          NaN
2014-12-22     9.032258
2014-12-23     9.206631
2014-12-24     8.240437
2014-12-25     9.459459
2014-12-26          NaN
2014-12-27          NaN
2014-12-28          NaN
2014-12-29          NaN
2014-12-30     7.037037
2014-12-31          NaN
2015-01-01     9.951018
dtype: float64



In [29]:

    
df['Min_per_mile'] = df['Minutes'] / df['Miles']



In [30]:

    
df.describe()









    Out[30]:






  
    
      
      Miles
      Seconds
      Minutes
      Min_per_mile
    
  
  
    
      count
      166.000000
      52.000000
      52.000000
      52.000000
    
    
      mean
      1.530964
      1901.788462
      31.696474
      7.725297
    
    
      std
      2.979802
      2560.424171
      42.673736
      0.954475
    
    
      min
      0.000000
      376.000000
      6.266667
      6.000000
    
    
      25%
      0.000000
      583.750000
      9.729167
      7.020479
    
    
      50%
      0.000000
      1343.500000
      22.391667
      7.533333
    
    
      75%
      2.007500
      1743.250000
      29.054167
      8.480023
    
    
      max
      26.200000
      15643.000000
      260.716667
      10.000000

We can see that along more shorter distances, my speed can vary a lot.



In [31]:

    
df.plot(x='Miles', y='Min_per_mile', kind='scatter')
plt.ylabel("Minutes / Mile")









    Out[31]:





<matplotlib.text.Text at 0x109489d10>

Let’s see a histogram of my speeds. Histograms are a great way of representing frequency data or how much certain things are occuring.



In [32]:

    
df.hist('Min_per_mile')









    Out[32]:





array([[<matplotlib.axes._subplots.AxesSubplot object at 0x1094f2090>]], dtype=object)

seems pretty center in that 7 minutes to 7.5 minute range. Let’s see if we can get more information with more bins which we specify with the bin argument.



In [33]:

    
df.hist('Min_per_mile',bins=20)









    Out[33]:





array([[<matplotlib.axes._subplots.AxesSubplot object at 0x1095b9390>]], dtype=object)

That’s interesting. Under 7 and then at 7.5 are the most popular. I bet that has something to do with my running distances too or the courses I choose to run.



In [34]:

    
df.hist('Min_per_mile',bins=20, figsize=(10,8))
plt.xlim((5, 11))
plt.ylim((0, 12))
plt.title("Minutes Per Mile Histogram")
plt.grid(False)
plt.savefig('../assets/minutes_per_mile_histogram.png')



In [35]:

    
df['Miles']









    Out[35]:





Date
2014-08-02     1.70
2014-08-03     1.40
2014-08-04     2.10
2014-08-05     0.00
2014-08-06     2.38
2014-08-07     2.30
2014-08-08     3.40
2014-08-09     2.50
2014-08-10     3.36
2014-08-11     1.67
2014-08-12     1.42
2014-08-13     2.35
2014-08-14     0.90
2014-08-14     0.90
2014-08-15     1.78
2014-08-16     0.65
2014-08-17     1.34
2014-08-18     0.65
2014-08-19     1.37
2014-08-19     1.37
2014-08-20     1.37
2014-08-21     1.37
2014-08-22     1.37
2014-08-23     1.34
2014-08-24     3.00
2014-08-25     1.37
2014-08-25     1.34
2014-08-26     1.37
2014-08-27     2.50
2014-08-28     0.00
              ...  
2014-12-04     0.00
2014-12-05     0.00
2014-12-06     0.00
2014-12-07     0.00
2014-12-08     0.00
2014-12-09     2.93
2014-12-10     0.00
2014-12-11     0.00
2014-12-12     1.37
2014-12-13     5.30
2014-12-13     2.21
2014-12-14     0.00
2014-12-15     6.50
2014-12-16     0.00
2014-12-17    12.00
2014-12-18     0.00
2014-12-19     3.10
2014-12-20    14.80
2014-12-21     0.00
2014-12-22     3.10
2014-12-23     5.63
2014-12-24     3.05
2014-12-25    14.80
2014-12-26     5.20
2014-12-27     0.00
2014-12-28     0.00
2014-12-29     0.00
2014-12-30     3.15
2014-12-31     0.00
2015-01-01    26.20
Name: Miles, dtype: float64

Now another cool thing you can do with time series is see the rolling mean or rolling sum or even rolling correlations. There’s a lot of different “rolling” type things you can do.



In [36]:

    
df['Miles'].plot()









    Out[36]:





<matplotlib.axes._subplots.AxesSubplot at 0x10a1e97d0>

So here’s a standard plot of our Miles again, just a line over time. To add another line to the same plot we just add more details to the box. As I was touching on the rolling values. Let’s talk about the rolling average. Now to do that I pass it a series or a data frame.



In [37]:

    
df['Miles'].plot()
pd.rolling_mean(df['Miles'], 7).plot()









    Out[37]:





<matplotlib.axes._subplots.AxesSubplot at 0x10a255950>

I can do the same with the rolling standard deviation or sum.



In [38]:

    
df['Miles'].plot()
pd.rolling_std(df['Miles'], 7).plot()









    Out[38]:





<matplotlib.axes._subplots.AxesSubplot at 0x10a758890>



In [39]:

    
df['Miles'].plot()

pd.rolling_sum(df['Miles'], 7).plot()









    Out[39]:





<matplotlib.axes._subplots.AxesSubplot at 0x10a823d10>

Now on the last note one thing that’s cool about date time indexes is that you can query them very naturally. If I want to get all my runs in october of 2014, I just enter that as a string.



In [40]:

    
df.index









    Out[40]:





DatetimeIndex(['2014-08-02', '2014-08-03', '2014-08-04', '2014-08-05',
               '2014-08-06', '2014-08-07', '2014-08-08', '2014-08-09',
               '2014-08-10', '2014-08-11', 
               ...
               '2014-12-23', '2014-12-24', '2014-12-25', '2014-12-26',
               '2014-12-27', '2014-12-28', '2014-12-29', '2014-12-30',
               '2014-12-31', '2015-01-01'],
              dtype='datetime64[ns]', name='Date', length=166, freq=None, tz=None)

If I want to get from November to December, I can do that as a Series.



In [41]:

    
df['2014-11':'2014-12']









    Out[41]:






  
    
      
      Miles
      Time
      Seconds
      Minutes
      Min_per_mile
    
    
      Date
      
      
      
      
      
    
  
  
    
      2014-11-01
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-02
      1.37
      0:08:43
      523
      8.716667
      6.362530
    
    
      2014-11-03
      5.30
      0:39:26
      2366
      39.433333
      7.440252
    
    
      2014-11-04
      3.05
      0:22:12
      1332
      22.200000
      7.278689
    
    
      2014-11-05
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-06
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-07
      5.63
      0:49:05
      2945
      49.083333
      8.718176
    
    
      2014-11-07
      1.26
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-07
      1.20
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-08
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-09
      3.05
      0:22:36
      1356
      22.600000
      7.409836
    
    
      2014-11-10
      1.50
      0:09:00
      540
      9.000000
      6.000000
    
    
      2014-11-10
      1.50
      0:11:33
      693
      11.550000
      7.700000
    
    
      2014-11-10
      4.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-11
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-12
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-13
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-14
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-15
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-16
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-17
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-18
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-19
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-20
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-21
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-22
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-23
      3.05
      0:22:35
      1355
      22.583333
      7.404372
    
    
      2014-11-24
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-25
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-11-26
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2014-12-03
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-04
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-05
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-06
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-07
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-08
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-09
      2.93
      0:25:00
      1500
      25.000000
      8.532423
    
    
      2014-12-10
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-11
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-12
      1.37
      0:09:15
      555
      9.250000
      6.751825
    
    
      2014-12-13
      5.30
      0:43:36
      2616
      43.600000
      8.226415
    
    
      2014-12-13
      2.21
      0:18:59
      1139
      18.983333
      8.589744
    
    
      2014-12-14
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-15
      6.50
      0:58:43
      3523
      58.716667
      9.033333
    
    
      2014-12-16
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-17
      12.00
      1:39:00
      5940
      99.000000
      8.250000
    
    
      2014-12-18
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-19
      3.10
      0:26:15
      1575
      26.250000
      8.467742
    
    
      2014-12-20
      14.80
      2:15:00
      8100
      135.000000
      9.121622
    
    
      2014-12-21
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-22
      3.10
      0:28:00
      1680
      28.000000
      9.032258
    
    
      2014-12-23
      5.63
      0:51:50
      3110
      51.833333
      9.206631
    
    
      2014-12-24
      3.05
      0:25:08
      1508
      25.133333
      8.240437
    
    
      2014-12-25
      14.80
      2:20:00
      8400
      140.000000
      9.459459
    
    
      2014-12-26
      5.20
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-27
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-28
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-29
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-12-30
      3.15
      0:22:10
      1330
      22.166667
      7.037037
    
    
      2014-12-31
      0.00
      NaN
      NaN
      NaN
      NaN
    
  

66 rows × 5 columns

How do you think we might go from october to January 1 2015?

Go ahead and give it a try and see if you can figure it out.



In [42]:

    
df['2014-11':'2015-1-1']['Miles'].plot()









    Out[42]:





<matplotlib.axes._subplots.AxesSubplot at 0x10a11b5d0>

Now we can specify a series this way but we can’t specific a specific date. To get a specific date’s run.



In [43]:

    
df['2014-8-12']









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-43-febf2e11e0d3> in <module>()
----> 1 df['2014-8-12']

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/frame.py in __getitem__(self, key)
   1795             return self._getitem_multilevel(key)
   1796         else:
-> 1797             return self._getitem_column(key)
   1798 
   1799     def _getitem_column(self, key):

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   1802         # get column
   1803         if self.columns.is_unique:
-> 1804             return self._get_item_cache(key)
   1805 
   1806         # duplicate columns & possible reduce dimensionaility

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1082         res = cache.get(item)
   1083         if res is None:
-> 1084             values = self._data.get(item)
   1085             res = self._box_item_values(item, values)
   1086             cache[item] = res

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   2849 
   2850             if not isnull(item):
-> 2851                 loc = self.items.get_loc(item)
   2852             else:
   2853                 indexer = np.arange(len(self.items))[isnull(self.items)]

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/index.py in get_loc(self, key, method)
   1570         """
   1571         if method is None:
-> 1572             return self._engine.get_loc(_values_from_object(key))
   1573 
   1574         indexer = self.get_indexer([key], method=method)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)()

KeyError: '2014-8-12'

To do that we need to use loc.



In [44]:

    
df.loc['2014-8-12']









    Out[44]:





Miles               1.42
Time             0:10:35
Seconds              635
Minutes         10.58333
Min_per_mile    7.453052
Name: 2014-08-12 00:00:00, dtype: object

now that we’ve done all this work. We should save it so that we don’t have to remember what our operations were or what stage we did them at. Now we could save it to csv like we did our other one but I wanted to illustrate all the different ways you can save this file.

Let’s save our csv, but we can also save it as an html page(which will give us a table view) or a json file.



In [45]:

    
df.head()









    Out[45]:






  
    
      
      Miles
      Time
      Seconds
      Minutes
      Min_per_mile
    
    
      Date
      
      
      
      
      
    
  
  
    
      2014-08-02
      1.70
      NaN
      NaN
      NaN
      NaN
    
    
      2014-08-03
      1.40
      NaN
      NaN
      NaN
      NaN
    
    
      2014-08-04
      2.10
      NaN
      NaN
      NaN
      NaN
    
    
      2014-08-05
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      2014-08-06
      2.38
      NaN
      NaN
      NaN
      NaN



In [46]:

    
df.to_csv('../data/date_fixed_running_data_with_time.csv')



In [47]:

    
df.to_html('../data/date_fixed_running_data_with_time.html')

One thing to note with JSON files is that they want unique indexes (because they're going to be come the keys), so we've got to give it a new index. We can do this by resetting our index or setting our index to a column.



In [48]:

    
df.to_json('../data/date_fixed_running_data_with_time.json')









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-48-c634f3f7ec32> in <module>()
----> 1 df.to_json('../data/date_fixed_running_data_with_time.json')

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler)
    873             force_ascii=force_ascii,
    874             date_unit=date_unit,
--> 875             default_handler=default_handler)
    876 
    877     def to_hdf(self, path_or_buf, key, **kwargs):

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler)
     33             obj, orient=orient, date_format=date_format,
     34             double_precision=double_precision, ensure_ascii=force_ascii,
---> 35             date_unit=date_unit, default_handler=default_handler).write()
     36     else:
     37         raise NotImplementedError("'obj' should be a Series or a DataFrame")

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/json.py in __init__(self, obj, orient, date_format, double_precision, ensure_ascii, date_unit, default_handler)
     63 
     64         self.is_copy = None
---> 65         self._format_axes()
     66 
     67     def _format_axes(self):

/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/pandas/io/json.py in _format_axes(self)
     96                 'index', 'columns'):
     97             raise ValueError("DataFrame index must be unique for orient="
---> 98                              "'%s'." % self.orient)
     99         if not self.obj.columns.is_unique and self.orient in (
    100                 'index', 'columns', 'records'):

ValueError: DataFrame index must be unique for orient='columns'.



In [49]:

    
df.reset_index()









    Out[49]:






  
    
      
      Date
      Miles
      Time
      Seconds
      Minutes
      Min_per_mile
    
  
  
    
      0
      2014-08-02
      1.70
      NaN
      NaN
      NaN
      NaN
    
    
      1
      2014-08-03
      1.40
      NaN
      NaN
      NaN
      NaN
    
    
      2
      2014-08-04
      2.10
      NaN
      NaN
      NaN
      NaN
    
    
      3
      2014-08-05
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      4
      2014-08-06
      2.38
      NaN
      NaN
      NaN
      NaN
    
    
      5
      2014-08-07
      2.30
      NaN
      NaN
      NaN
      NaN
    
    
      6
      2014-08-08
      3.40
      NaN
      NaN
      NaN
      NaN
    
    
      7
      2014-08-09
      2.50
      NaN
      NaN
      NaN
      NaN
    
    
      8
      2014-08-10
      3.36
      0:28:37
      1717
      28.616667
      8.516865
    
    
      9
      2014-08-11
      1.67
      0:13:07
      787
      13.116667
      7.854291
    
    
      10
      2014-08-12
      1.42
      0:10:35
      635
      10.583333
      7.453052
    
    
      11
      2014-08-13
      2.35
      0:17:25
      1045
      17.416667
      7.411348
    
    
      12
      2014-08-14
      0.90
      0:06:16
      376
      6.266667
      6.962963
    
    
      13
      2014-08-14
      0.90
      0:06:16
      376
      6.266667
      6.962963
    
    
      14
      2014-08-15
      1.78
      NaN
      NaN
      NaN
      NaN
    
    
      15
      2014-08-16
      0.65
      NaN
      NaN
      NaN
      NaN
    
    
      16
      2014-08-17
      1.34
      NaN
      NaN
      NaN
      NaN
    
    
      17
      2014-08-18
      0.65
      NaN
      NaN
      NaN
      NaN
    
    
      18
      2014-08-19
      1.37
      NaN
      NaN
      NaN
      NaN
    
    
      19
      2014-08-19
      1.37
      NaN
      NaN
      NaN
      NaN
    
    
      20
      2014-08-20
      1.37
      0:09:33
      573
      9.550000
      6.970803
    
    
      21
      2014-08-21
      1.37
      0:09:24
      564
      9.400000
      6.861314
    
    
      22
      2014-08-22
      1.37
      0:09:05
      545
      9.083333
      6.630170
    
    
      23
      2014-08-23
      1.34
      NaN
      NaN
      NaN
      NaN
    
    
      24
      2014-08-24
      3.00
      NaN
      NaN
      NaN
      NaN
    
    
      25
      2014-08-25
      1.37
      0:08:55
      535
      8.916667
      6.508516
    
    
      26
      2014-08-25
      1.34
      NaN
      NaN
      NaN
      NaN
    
    
      27
      2014-08-26
      1.37
      0:10:30
      630
      10.500000
      7.664234
    
    
      28
      2014-08-27
      2.50
      0:25:00
      1500
      25.000000
      10.000000
    
    
      29
      2014-08-28
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      136
      2014-12-04
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      137
      2014-12-05
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      138
      2014-12-06
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      139
      2014-12-07
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      140
      2014-12-08
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      141
      2014-12-09
      2.93
      0:25:00
      1500
      25.000000
      8.532423
    
    
      142
      2014-12-10
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      143
      2014-12-11
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      144
      2014-12-12
      1.37
      0:09:15
      555
      9.250000
      6.751825
    
    
      145
      2014-12-13
      5.30
      0:43:36
      2616
      43.600000
      8.226415
    
    
      146
      2014-12-13
      2.21
      0:18:59
      1139
      18.983333
      8.589744
    
    
      147
      2014-12-14
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      148
      2014-12-15
      6.50
      0:58:43
      3523
      58.716667
      9.033333
    
    
      149
      2014-12-16
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      150
      2014-12-17
      12.00
      1:39:00
      5940
      99.000000
      8.250000
    
    
      151
      2014-12-18
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      152
      2014-12-19
      3.10
      0:26:15
      1575
      26.250000
      8.467742
    
    
      153
      2014-12-20
      14.80
      2:15:00
      8100
      135.000000
      9.121622
    
    
      154
      2014-12-21
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      155
      2014-12-22
      3.10
      0:28:00
      1680
      28.000000
      9.032258
    
    
      156
      2014-12-23
      5.63
      0:51:50
      3110
      51.833333
      9.206631
    
    
      157
      2014-12-24
      3.05
      0:25:08
      1508
      25.133333
      8.240437
    
    
      158
      2014-12-25
      14.80
      2:20:00
      8400
      140.000000
      9.459459
    
    
      159
      2014-12-26
      5.20
      NaN
      NaN
      NaN
      NaN
    
    
      160
      2014-12-27
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      161
      2014-12-28
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      162
      2014-12-29
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      163
      2014-12-30
      3.15
      0:22:10
      1330
      22.166667
      7.037037
    
    
      164
      2014-12-31
      0.00
      NaN
      NaN
      NaN
      NaN
    
    
      165
      2015-01-01
      26.20
      4:20:43
      15643
      260.716667
      9.951018
    
  

166 rows × 6 columns



In [50]:

    
df['Date'] = df.index



In [51]:

    
df.index = range(df.shape[0])



In [52]:

    
df.head()
df.to_json('../data/date_fixed_running_data_with_time.json')

Now there’s a LOT more you can do with date time indexing but this is about all that I wanted to cover in this video. We will get into more specifics later. By now you should be getting a lot more familiar with pandas and what the ipython + pandas workflow is.



In [53]:

    
df.Date[0]









    Out[53]:





Timestamp('2014-08-02 00:00:00')



In [ ]:

	Miles	Seconds	Minutes
Miles	1.000000	0.997203	0.997203
Seconds	0.997203	1.000000	1.000000
Minutes	0.997203	1.000000	1.000000

	Miles	Seconds	Minutes
Miles	1.00000	0.88305	0.88305
Seconds	0.88305	1.00000	1.00000
Minutes	0.88305	1.00000	1.00000

	Miles	Seconds	Minutes
Miles	1.00000	0.96482	0.96482
Seconds	0.96482	1.00000	1.00000
Minutes	0.96482	1.00000	1.00000

	Unnamed: 0	Miles	Time
0	2014-08-02	1.70	NaN
1	2014-08-03	1.40	NaN
2	2014-08-04	2.10	NaN
3	2014-08-05	0.00	NaN
4	2014-08-06	2.38	NaN

	Miles	Time	Seconds
Date
2014-08-02	1.70	NaN	NaN
2014-08-03	1.40	NaN	NaN
2014-08-04	2.10	NaN	NaN
2014-08-05	0.00	NaN	NaN
2014-08-06	2.38	NaN	NaN
2014-08-07	2.30	NaN	NaN
2014-08-08	3.40	NaN	NaN
2014-08-09	2.50	NaN	NaN
2014-08-10	3.36	0:28:37	1717
2014-08-11	1.67	0:13:07	787

	Miles	Seconds
count	166.000000	52.000000
mean	1.530964	1901.788462
std	2.979802	2560.424171
min	0.000000	376.000000
25%	0.000000	583.750000
50%	0.000000	1343.500000
75%	2.007500	1743.250000
max	26.200000	15643.000000

	Miles	Time	Seconds	Minutes	Min_per_mile
Date
2014-11-01	0.00	NaN	NaN	NaN	NaN
2014-11-02	1.37	0:08:43	523	8.716667	6.362530
2014-11-03	5.30	0:39:26	2366	39.433333	7.440252
2014-11-04	3.05	0:22:12	1332	22.200000	7.278689
2014-11-05	0.00	NaN	NaN	NaN	NaN
2014-11-06	0.00	NaN	NaN	NaN	NaN
2014-11-07	5.63	0:49:05	2945	49.083333	8.718176
2014-11-07	1.26	NaN	NaN	NaN	NaN
2014-11-07	1.20	NaN	NaN	NaN	NaN
2014-11-08	0.00	NaN	NaN	NaN	NaN
2014-11-09	3.05	0:22:36	1356	22.600000	7.409836
2014-11-10	1.50	0:09:00	540	9.000000	6.000000
2014-11-10	1.50	0:11:33	693	11.550000	7.700000
2014-11-10	4.00	NaN	NaN	NaN	NaN
2014-11-11	0.00	NaN	NaN	NaN	NaN
2014-11-12	0.00	NaN	NaN	NaN	NaN
2014-11-13	0.00	NaN	NaN	NaN	NaN
2014-11-14	0.00	NaN	NaN	NaN	NaN
2014-11-15	0.00	NaN	NaN	NaN	NaN
2014-11-16	0.00	NaN	NaN	NaN	NaN
2014-11-17	0.00	NaN	NaN	NaN	NaN
2014-11-18	0.00	NaN	NaN	NaN	NaN
2014-11-19	0.00	NaN	NaN	NaN	NaN
2014-11-20	0.00	NaN	NaN	NaN	NaN
2014-11-21	0.00	NaN	NaN	NaN	NaN
2014-11-22	0.00	NaN	NaN	NaN	NaN
2014-11-23	3.05	0:22:35	1355	22.583333	7.404372
2014-11-24	0.00	NaN	NaN	NaN	NaN
2014-11-25	0.00	NaN	NaN	NaN	NaN
2014-11-26	0.00	NaN	NaN	NaN	NaN
...	...	...	...	...	...
2014-12-03	0.00	NaN	NaN	NaN	NaN
2014-12-04	0.00	NaN	NaN	NaN	NaN
2014-12-05	0.00	NaN	NaN	NaN	NaN
2014-12-06	0.00	NaN	NaN	NaN	NaN
2014-12-07	0.00	NaN	NaN	NaN	NaN
2014-12-08	0.00	NaN	NaN	NaN	NaN
2014-12-09	2.93	0:25:00	1500	25.000000	8.532423
2014-12-10	0.00	NaN	NaN	NaN	NaN
2014-12-11	0.00	NaN	NaN	NaN	NaN
2014-12-12	1.37	0:09:15	555	9.250000	6.751825
2014-12-13	5.30	0:43:36	2616	43.600000	8.226415
2014-12-13	2.21	0:18:59	1139	18.983333	8.589744
2014-12-14	0.00	NaN	NaN	NaN	NaN
2014-12-15	6.50	0:58:43	3523	58.716667	9.033333
2014-12-16	0.00	NaN	NaN	NaN	NaN
2014-12-17	12.00	1:39:00	5940	99.000000	8.250000
2014-12-18	0.00	NaN	NaN	NaN	NaN
2014-12-19	3.10	0:26:15	1575	26.250000	8.467742
2014-12-20	14.80	2:15:00	8100	135.000000	9.121622
2014-12-21	0.00	NaN	NaN	NaN	NaN
2014-12-22	3.10	0:28:00	1680	28.000000	9.032258
2014-12-23	5.63	0:51:50	3110	51.833333	9.206631
2014-12-24	3.05	0:25:08	1508	25.133333	8.240437
2014-12-25	14.80	2:20:00	8400	140.000000	9.459459
2014-12-26	5.20	NaN	NaN	NaN	NaN
2014-12-27	0.00	NaN	NaN	NaN	NaN
2014-12-28	0.00	NaN	NaN	NaN	NaN
2014-12-29	0.00	NaN	NaN	NaN	NaN
2014-12-30	3.15	0:22:10	1330	22.166667	7.037037
2014-12-31	0.00	NaN	NaN	NaN	NaN